Sequencing and Raw Sequence Data Quality Control ◾ 3
1.2 SEQUENCING
DNA/RNA sequencing is the determination of the order of the four nucleotides in a nucleic
acid molecule. The recovered order of the nucleotides in a genome of an organism is called
a sequence. Sequencing of the DNA helps scientists to investigate the functions of genes,
roles of mutations in traits and diseases, species, evolutionary relationships between spe-
cies, diagnosis of diseases caused by genetic factors, development of gene therapy, criminal
investigations and legal problems, and more. Since the nucleotides are distinguished by the
bases, the DNA and RNA sequences are represented in bioinformatics by the sequences of
the four-nucleobase single-character symbols (A, C, G, and T for DNA and A, C, G, and
U for RNA).
The attempts to sequence nucleic acid began immediately after the landmark discovery
in 1953 of the double-helix structure of the DNA by James Watson and Francis Crick.
The alanine tRNA was the first nucleic acid sequenced in 1965 by the Nobel prize winner
Robert Holley. Holley used two ribonuclease enzymes to split the tRNA at specific nucleo-
tide positions, and the order of the nucleotides was determined manually [1]. The first DNA
molecule was sequenced in 1972 by Walter Fiers. That DNA molecule was the gene that
codes the coat protein of the bacteriophage MS2, and the sequencing was made by using
enzymes to break the bacteriophage RNA into pieces and separating the fragments with
electrophoresis and chromatography [2]. The sequencing of the alanine tRNA by Robert
Holley and the sequencing of the gene of the bacteriophage MSE coat protein are ones of
the major milestones in the history of genomics and DNA sequencing. They paved the way
for the first-generation sequencing.
1.2.1 First-Generation Sequencing
The early 1970s witnessed the emergence of the first-generation sequencing when the
American biologists Allan M. Maxam and Walter Gilbert developed a chemical method
for sequencing, followed by the English biochemist Frederick Sanger who developed the
chain-terminator method. The Sanger method became the more commonly used first-
generation sequencing method to this date. Both methods were used in the shotgun
sequencing, which involves breaking genome into DNA fragments and sequencing of the
fragments individually. The genome sequence is then assembled based on the overlaps after
aligning the fragment sequences.
The Maxam–Gilbert sequencing method is based on the chemical modification of DNA
molecules and subsequent cleavage at specific bases. In the Maxam–Gilbert sequencing
method, first, the DNA is denatured (separation of the DNA strands) by heating or helicase
enzyme into single-stranded DNA (ssDNA) molecules. The ssDNA is run in the gel elec-
trophoresis to separate the two DNA strands into two bands. Any one of the bands (strand)
can be cut from the gel and sequenced. In the sequencing step, the solution with ssDNA is
then divided into four reaction tubes labeled A+G, G, C+T, and C. The ssDNA in each tube
is labeled chemically with an isotope and treated with a specific chemical that breaks the
DNA strand at a specific nucleotide according to the tube labels. After the reaction, poly-
acrylamide gel is then used for running the four reactions in four separate lanes (A+G, G,